-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a fast-path to Debug
ASCII &str
#121150
Conversation
While we are still bikeshedding the implementation details, I would appreciate a benchmark run. Not sure if the compiler performance testsuite will show any change, in a micro-benchmark, I was able to get a 10x improvement for a pure-ASCII string, and even a tiny improvement in the "pure"-Unicode case (because even a "pure" unicode sentence still contains ASCII spaces) |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Add a fast-path to `Debug` ASCII `&str` Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping. --- This is an alternative / a companion to rust-lang#121138. The other PR is adding the fast path deep within `EscapeDebug`, whereas this skips as early as possible.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (d3c44b1): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 640.535s -> 641.862s (0.21%) |
Not too surprisingly, the compile time benchmarks did not budge, or look bogus. Though if you select "runtime", there is a very significant improvement to So the question remains, what can I do to improve the confidence in this change? I have looked through other similar existing methods, but none of which do exactly what I expect. Another option would be to also perf-test #121138, as in my local benchmarking I found that the slowness came from binary searching the |
Let's see what this now does on top of #121138... @bors try @rust-timer queue And for good measure, let's mention that #122013 is also making changes around this. |
This comment has been minimized.
This comment has been minimized.
Add a fast-path to `Debug` ASCII `&str` Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping. --- This is an alternative / a companion to rust-lang#121138. The other PR is adding the fast path deep within `EscapeDebug`, whereas this skips as early as possible.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (d253c69): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 644.65s -> 646.087s (0.22%) |
Another -26.93% on the |
AIUI you're still investigating... |
I rebased, reduced the number of @rustbot ready |
This comment has been minimized.
This comment has been minimized.
Instead of writing each `char` of an escape sequence one by one, this delegates to `Display`, which uses `write_str` internally in order to write the whole escape sequence at once.
Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping.
Instead of having a single loop that works on utf-8 `char`s, this splits the implementation into a loop that quickly skips over printable ASCII, falling back to per-char iteration for other chunks.
Surprisingly, benchmarks have shown that using `&str` instead of `&[u8]` with some `unsafe` code is actually faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great and is ready to merge (modulo the comment nit), thank you!
I have some ideas on how to optimize this further (I don't think the len_utf8
gets optimized away, even though it is redundant), but I think we can leave that for a follow-up PR...
This avoids having to collect a non-ASCII-printable run before processing it.
Thank you! |
☀️ Test successful - checks-actions |
Finished benchmarking commit (213ad10): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (secondary 4.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -2.4%, secondary -4.4%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary 0.0%, secondary 0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 673.01s -> 673.59s (0.09%) |
Add benchmarks for `impl Debug for str` In order to inform future perf improvements and prevent regressions, lets add some benchmarks that stress `impl Debug for str`. --- As I am currently working on improving the perf in rust-lang/rust#121150, its nice to have these benchmarks. Writing them, I also saw that escapes are written out one char at a time, even though other parts of the code are already optimizing that via `as_str`, which I intend to do as well as a followup improvement. r? ``@cuviper`` ☝🏻 as you were also assigned to rust-lang/rust#121150, CC ``@the8472`` if you want to steal the review :-)
Instead of going through the
EscapeDebug
machinery, we can just skip over ASCII chars that don’t need any escaping.This is an alternative / a companion to #121138.
The other PR is adding the fast path deep within
EscapeDebug
, whereas this skips as early as possible.